skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Gao, Ming"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Trajectory inference methods are essential for analyzing the developmental paths of cells in single-cell sequencing datasets. It provides insights into cellular differentiation, transitions, and lineage hierarchies, helping unravel the dynamic processes underlying development and disease progression. However, many existing tools lack a coherent statistical model and reliable uncertainty quantification, limiting their utility and robustness. In this paper, we introduce VITAE (Variational Inference for Trajectory by AutoEncoder), a statistical approach that integrates a latent hierarchical mixture model with variational autoencoders to infer trajectories. The statistical hierarchical model enhances the interpretability of our framework, while the posterior approximations generated by our variational autoencoder ensure computational efficiency and provide uncertainty quantification of cell projections along trajectories. Specifically, VITAE enables simultaneous trajectory inference and data integration, improving the accuracy of learning a joint trajectory structure in the presence of biological and technical heterogeneity across datasets. We show that VITAE outperforms other state-of-the-art trajectory inference methods on both real and synthetic data under various trajectory topologies. Furthermore, we apply VITAE to jointly analyze three distinct single-cell RNA sequencing datasets of the mouse neocortex, unveiling comprehensive developmental lineages of projection neurons. VITAE effectively reduces batch effects within and across datasets and uncovers finer structures that might be overlooked in individual datasets. Additionally, we showcase VITAE’s efficacy in integrative analyses of multiomic datasets with continuous cell population structures. 
    more » « less
  2. We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning. We derive explicit finite-sample guarantees for both approaches, and show that both approaches are optimal by deriving matching lower bounds. Additionally, we conduct numerical experiments to compare the performance of various algorithms, providing further insights and empirical evidence. 
    more » « less
  3. null (Ed.)
    We establish finite-sample guarantees for a polynomial-time algorithm for learning a nonlinear, nonparametric directed acyclic graphical (DAG) model from data. The analysis is model-free and does not assume linearity, additivity, independent noise, or faithfulness. Instead, we impose a condition on the residual variances that is closely related to previous work on linear models with equal variances. Compared to an optimal algorithm with oracle knowledge of the variable ordering, the additional cost of the algorithm is linear in the dimension d and the number of samples n. Finally, we compare the proposed algorithm to existing approaches in a simulation study. 
    more » « less
  4. null (Ed.)
  5. The assembly of large RNA-protein granules occurs in germ cells of many animals and these germ granules have provided a paradigm to study structure-functional aspects of similar structures in different cells. Germ granules in Drosophila oocyte’s posterior pole (polar granules) are composed of RNA, in the form of homotypic clusters, and proteins required for germline development. In the granules, Piwi protein Aubergine binds to a scaffold protein Tudor, which contains 11 Tudor domains. Using a super-resolution microscopy, we show that surprisingly, Aubergine and Tudor form distinct clusters within the same polar granules in early Drosophila embryos. These clusters partially overlap and, after germ cells form, they transition into spherical granules with the structural organization unexpected from these interacting proteins: Aubergine shell around the Tudor core. Consistent with the formation of distinct clusters, we show that Aubergine forms homo-oligomers and using all purified Tudor domains, we demonstrate that multiple domains, distributed along the entire Tudor structure, interact with Aubergine. Our data suggest that in polar granules, Aubergine and Tudor are assembled into distinct phases, partially mixed at their “interaction hubs”, and that association of distinct protein clusters may be an evolutionarily conserved mechanism for the assembly of germ granules. 
    more » « less